\documentclass[a4paper]{article}

\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage[colorinlistoftodos]{todonotes}
\usepackage{url}
\usepackage{hyperref}

\newcommand{\concatA}{%
  \mathbin{\raisebox{1ex}{\scalebox{.7}{$\frown$}}}%
}
\newcommand{\concatB}{%
  \mathbin{\rotatebox[origin=c]{90}{\scalebox{.7}{(\kern1ex)}}}
}
\title{Gitchain: Decentralized Git Hosting}

\author{Yurii Rashkovskii \texttt{yrashk@gmail.com}}

\date{\today}

\begin{document}
\maketitle

\begin{abstract}
This paper presents an approach for a decentralized storage of Git repositories as well as non-conflicting, fair name allocation and secure maintenance of push permissions to those repositories.
\end{abstract}

\section{Introduction}

Over past couple of years, increased usage of Git and similar tools has brought services like GitHub, Bitbucket and Gitorious to life. However, once a centralized service is experiencing a malfunction, network problem or a denial of service attack, wide reliance on it is severely impacting development workflow as a huge percentage of open source projects effectively go offline for the period of outage.

There has been at least one known attempt to create a decentralized git system, GitTorrent \cite{fonseca06}, however, it did not gain any significant traction and has been effectively abandoned.

However, recent developments such as Bitcoin or Namecoin, combined with older ones such as different variations of DHT and public key cryptography warrant another attempt to remove a single point of failure present in today's Git infrastructure.

In order to make the problem simpler, this paper splits it into a few distinct parts and discusses them separately, in the order of logical significance. The goal of this paper is to design the simplest possible system while (ideally) allowing improvements to be stacked on top of it. Some of these improvements might be described within the paper.

\section{General Architecture}

Gitchain is built around a couple of concepts, and is largerly inspired by Bitcoin and Namecoin.

Gitchain maintains its own \emph{blockchain}. However, there are some notable differences:

\begin{itemize}
\item \emph{No scripting language}. Instead, different transaction types are used. This simplifies its implementation and helps addressing the transaction malleability \cite{txnmal} issue.
\item \emph{Target block discovery time is 30 seconds \todo{Not a final number yet}}. Network difficulty is adjusted every 8640 blocks (or roughly 3 days) \todo{Final numbers will depend on target discovery time}
\end{itemize}

In addition to the blockchain, Gitchain uses a DHT technique to store Git objects in a distributed manner.

\section{Public Key Cryptography}

Gitchain is using elliptic public key cryptography to implement signing and verification, using secp256k1 curve \todo{This is not a final decision. There's a lot of interesting materials available on this subject: \cite{safecurves} \cite{trough} \cite{nistcurvesdangers}}, the same one used in
Bitcoin.

\section{Transaction}

Every transaction on the Gitchain blockchain has a SHA256 hash $TransactionHash := sha256(Transaction)$. In this paper, transactions are typically represented with its name and record-like structure:

\begin{verbatim}
SampleTransaction{Value: ...}
\end{verbatim}

\subsection{Envelope}

Every Gitchain transaction is enveloped with this structure:

\begin{table}[h]
\centering
\begin{tabular}{|l|l|l|}
\hline
\textbf{Field} & \textbf{Description}\\ \hline
PreviousEnvelopeHash & $EnvelopeHash(PreviousEnvelope)$ \\ \hline
Signature & $ECDSASignature(EnvelopeHash(Envelope))$ \\ \hline
PublicKey & Public key used in the signature \\ \hline
NextPublicKey & Next public key in the transaction chain \\ \hline
\end{tabular}
\end{table}


$$H := sha256(TransactionHash(Transaction) \concatA PreviousEnvelopeHash \concatA NextPublicKey)$$ where $Transaction$ is the transaction enveloped and is used to uniquely address that envelope.


\section{Mining}

In order to maintain the blockchain, Gitchain nodes mine blocks, similarly to Bitcoin or Namecoin. However, to avoid Bitcoin miners
with optimized hardware to attempt taking over Gitchain mining, a different algorithm should be chosen.

\subsection{Block Attribution Transaction (BAT)}

While a block can be mined without attribution, it is generally recommended that a miner includes the following transaction into every block it mines. It can be later used as a proof of work record to be redeemed in other transactions.

\begin{verbatim}
BlockAttribution{}
\end{verbatim}

These transactions should never be broadcasted independently of a block. If a node receives an out-of-block BAT transaction, it is discarded immediately.

\section{Name Allocation}

Before any repository can be addressed (and therefore pulled from or pushed to), it needs to have a unique name, with following properties:

\begin{enumerate}
\item A name has a unique textual (defined as \texttt{hpath} in RFC1738 \cite{rfc1738}) representation (to be used as a part of HTTP URL)
\item Name allocation happens on a first-come basis
\item Name can be deallocated (for repository removal or renaming)
\item Fair and reasonable name prefix allocation should be possible (for example, to claim your own base user identifier, for example \texttt{johndoe/} in \texttt{johndoe/foobar})
\end{enumerate}

\subsection{Name Reservation Transaction (NRT)}

Similarly to Namecoin, before a name can be allocated it has to be securely reserved. The following transaction need to mature (12 confirmations\todo{Not a final number}) before the name can be allocated.


\begin{verbatim}
NameReservation(Hash: Hash(rand, name))
\end{verbatim}

The motivation behind this is the same as in Namecoin, this is to prevent others from stealing your name before you had a chance to get it included and confirmed.

\subsection{Name Allocation Transaction (NAT)}

After the corresponding NRT has matured, a name allocation can happen:

\begin{verbatim}
NameAllocation(Name: name, Rand: rand)
\end{verbatim}

Name allocation is considered mature after 12 confirmations.

\subsection{Name Deallocation Transaction (NDT)}

After the NAT has matured, one can deallocate the corresponding name.

\begin{verbatim}
NameDeallocation(Name: name)
\end{verbatim}

\section{Permission Management}

Upon name allocation, the holder of the private key used to allocate the name is granted full permissions to the repository.

\section{Repository Transactions}

\subsection{Object Announcement Transaction (OAT)}

\begin{verbatim}
ObjectAnnouncement(Hash: hash, Type: blob|tree|commit)
\end{verbatim}

\subsection{Reference Update Transaction (RUT)}

\begin{verbatim}
ReferenceUpdate(Repository: repository, Ref: ref, Hash: hash)
\end{verbatim}

\section{Object Storage}

Object storage is where Gitchain goes outside of the blockchain boundaries.

\begin{thebibliography}{9}

\bibitem{fonseca06}
  Jonas Fonseca,
  \emph{GitTorrent: a P2P-based Storage Backend for git}, 2006.
  \url{http://jonas.nitro.dk/tmp/foo/gittorrent.html/main.html}

\bibitem{rfc1738}
  \emph{Uniform Resource Locators (URL)}.
  \url{http://www.ietf.org/rfc/rfc1738.txt}

\bibitem{safecurves}
  Daniel J. Bernstein, Tanja Lange,
  \emph{SafeCurves: choosing safe curves for elliptic-curve cryptography}
  \url{http://safecurves.cr.yp.to/}

\bibitem{trough}
  Fabio Pietrosanti,
  \emph{Not every elliptic curve is the same: trough on ECC security}
  \url{http://infosecurity.ch/20100926/not-every-elliptic-curve-is-the-same-trough-on-ecc-security/}

\bibitem{nistcurvesdangers}
  Daniel J. Bernstein, Tanja Lange,
  \emph{Security dangers of the NIST curves}
  \url{http://www.hyperelliptic.org/tanja/vortraege/20130531.pdf}

\bibitem{qzheng}
  Qingji Zheng, Shouhuai Xu
  \emph{Secure and Efficient Proof of Storage with Deduplication}
  \url{http://eprint.iacr.org/2011/529.pdf}

\bibitem{ateniese}
  Giuseppe Ateniese, Seny Kamara, Jonathan Katz
  \emph{Proofs of Storage from Homomorphic Identification Protocols}
  \url{http://www.cs.jhu.edu/~ateniese/papers/pos.pdf}

\end{thebibliography}

\end{document}
